Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration
نویسندگان
چکیده
Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration. We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language.
منابع مشابه
Translation rules for english to hindi machine translation system: homoeopathy domain
Rule based machine translation system embraces a set of grammar rules which are mandatory for the mapping of syntactic representations of a source language, on the target language. The system necessitates good linguistic knowledge to write rules and require of acquaintance source such as corpus and bilingual dictionary. In this paper, we have described the grammar rules intended for our English...
متن کاملImproving Machine Translation via Triangulation and Transliteration
In this paper we improve Urdu→Hindi English machine translation through triangulation and transliteration. First we built an Urdu→Hindi SMT system by inducing triangulated and transliterated phrase-tables from Urdu–English and Hindi–English phrase translation models. We then use it to translate the Urdu part of the Urdu-English parallel data into Hindi, thus creating an artificial Hindi-English...
متن کاملPOS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments
We discuss Part-of-Speech(POS) tagging of Hindi-English Code-Mixed(CM) text from social media content. We propose extensions to the existing approaches, we also present a new feature set which addresses the transliteration problem inherent in social media. We achieve an 84% accuracy with the new feature set. We show that the context and joint modeling of language detection and POS tag layers do...
متن کاملEnglish to Hindi Paraphrase Convention for Translating Homoeopathy Literature
The rule based approach to machine translation (MT) confines grammatical rules between the source and the target language with the goal of constructing grammatical translation between the language pair. In this paper, we describe the structural representation of English stemmer, POS tagging and design transfer rules which can generate Hindi sentence from the structural representation of the Eng...
متن کاملUrdu and Hindi: Translation and sharing of linguistic resources
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resour...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1307.3310 شماره
صفحات -
تاریخ انتشار 2013